1.СÐò
ÎÒÃÇÔÚǰÎÄÖÐÖØµãÏÈÈÝÁË»ùÓÚͼÏñ·ÖÀàµÄ¶ñÒâ´úÂëͬԴÆÊÎöÒªÁ죬£¬£¬¸ÃÒªÁìʵÖÊÉÏÊÇÆ¾Ö¤¶ñÒâ´úÂë×Ö½ÚÔ¼ÄÚÈݵÄÌØÕ÷¾ÙÐзÖÀà¡£¡£¡£¡£¡£È»¶ø£¬£¬£¬ÕâÖÖÒªÁì´ÓÄæÏò¹¤³ÌµÄ½Ç¶ÈÀ´¿´²»¾ßÓпÉÚ¹ÊÍÐÔ¡£¡£¡£¡£¡£
ÖÚËùÖÜÖª£¬£¬£¬»ã±à´úÂë¾ßÓнÏΪÏÊÃ÷µÄÓï·¨¿É¶ÁÐÔ¡£¡£¡£¡£¡£ÈôÊÇÏȰѶñÒâ´úÂë¾ÙÐз´»ã±à£¬£¬£¬È»ºóÓÃ×ÔÈ»ÓïÑÔ´¦Öóͷ££¨Natural Language Processing£©ÊÖÒÕÌáÈ¡´úÂëÓïÒåÌØÕ÷£¬£¬£¬ÔÙ¾ÙÐÐͬԴÆÊÎö£¬£¬£¬ÕâÑùµÄÒªÁì¾ÍÈÝÒ×Ú¹ÊÍ£¬£¬£¬Õâ¾ÍÊDZ¾ÎĽ«ÏÈÈݵĻùÓÚ´úÂëÓïÒåµÄͬԴÆÊÎöÒªÁì¡£¡£¡£¡£¡£Ä¿½ñ£¬£¬£¬ÕâÖÖÒªÁì²»µ«±»ÓÃÓÚ¶ñÒâ´úÂë¼ì²âÁìÓò£¬£¬£¬»¹±»ÓÃÔÚ´úÂë¿Ë¡ËÑË÷¡¢´úÂëÇÖȨÅжϵÈÁìÓò¡£¡£¡£¡£¡£
±¾ÎÄÊ×ÏÈÏÈÈÝÁË»ùÓÚ´úÂëÓïÒåͬԴÆÊÎöµÄ»ù´¡ÖªÊ¶£»£»£»Æä´ÎÏÈÈÝÁË»ùÓÚ´úÂëÓïÒåµÄͬԴÆÊÎöÏà¹ØÊÂÇ飻£»£»×îºó£¬£¬£¬¸ø³öÁË»ùÓÚ´úÂëÓïÒåµÄͬԴÆÊÎöÊÖÒռƻ®Éè¼Æ£¬£¬£¬²¢Í¨¹ýʵÑéÑéÖ¤Á˼ƻ®µÄÓÐÓÃÐÔ¡£¡£¡£¡£¡£
2.»ù´¡ÖªÊ¶
»ùÓÚ´úÂëÓïÒåµÄ¶ñÒâ´úÂëͬԴÆÊÎöµÄ»ù´¡ÊÇÓïÒåÌáÈ¡¡£¡£¡£¡£¡£PV-DMºÍTextCNNÊÇNLPÁìÓòÓйشúÂëÓïÒåÌáÈ¡µÄÁ½ÖÖ³£¼ûµÄÄ£×Ó, ˵Ã÷ÈçÏ£º
(1)¾äÏòÁ¿µÄÂþÑÜʽӰÏóÄ£×Ó£¨Distributed Memory Model of Paragraph Vectors£¬£¬£¬PV-DM£©
ÔÚPV-DMÄ£×ÓÖУ¬£¬£¬´ÊÏòÁ¿ºÍ¾äÏòÁ¿ÏàÆ´½Ó£¬£¬£¬ÓÃÀ´Õ¹ÍûÎı¾ÖеÄÏÂÒ»¸ö´Ê£¬£¬£¬Í¨¹ýÔÚ¾ä×ÓÉϵĴ°¿Ú»¬¶¯£¬£¬£¬Ê¹¾äÏòÁ¿Ó°Ïó¾ä×ÓÖÐËùÓдʵÄÉÏÏÂÎĹØÏµ¡£¡£¡£¡£¡£ÔÚ´úÂëÓïÒåÌáÈ¡ÖÐʹÓÃPV-DMÄ£×Ó£¬£¬£¬ÄܼòÆÓÓÐÓõؽâ¾öÏòÁ¿³¤¶È·×ÆçÖÂÎÊÌ⣨ͼ1£©.

ͼ1 PV-DMÄ£×Ó
(2)TextCNNÄ£×Ó
TextCNNͨ¹ýÆ´½Ó´ÊÏòÁ¿½«Îı¾×ª»¯³É¾ØÕ󣬣¬£¬È»ºóÓ¦Óþí»ýÉñ¾ÍøÂçʩչÉî¶ÈѧϰµÄÓÅÊÆ¡£¡£¡£¡£¡£Ïà±ÈÓÚÒ»Ñùƽ³£µÄ¾í»ýÉñ¾ÍøÂçÄ£×Ó£¬£¬£¬TextCNNÔÚ¾í»ý²ãÖÐÓ¦Óöà¸ö²î±ð³ß´çµÄ¾í»ýºË£¨Í¼2£©¡£¡£¡£¡£¡£TextCNN¾ßÓÐÍøÂç½á¹¹¼òÆÓ¡¢Ñ·üçٶȿ첢ÇÒЧ¹û½ÏºÃµÈÓŵ㡣¡£¡£¡£¡£¿ÉÊÇ£¬£¬£¬ÔÚǶÈë²ãÖнÓÄÉԤѵÁ·µÄ´ÊÏòÁ¿Ä£×Ó£¨ÈçWord2Vec£©¾ÙÐÐÓïÒåÌáÈ¡£¬£¬£¬Òò¶ø»áÓг¤¶È·×ÆçÖµÄÎÊÌâ¡£¡£¡£¡£¡£

ͼ2 TextCNNÄ£×Ó
3.Ïà¹ØÊÂÇé
ZhangµÈ[2]Î§ÈÆÀÕË÷Èí¼þµÄ¼Ò×å·ÖÀàÎÊÌ⣬£¬£¬Ìá³öÒ»ÖÖÌØÕ÷ÌáȡҪÁ죬£¬£¬¸ÃÒªÁ콫Ñù±¾Ö¸ÁîÐòÁÐת»»Îª²î±ðnֵʱµÄn-gramÜöÝÍ£¬£¬£¬ÅÌËãÿ¸ön-gramµÄTF-IDF£¨term frequency¨Cinverse document frequency£©²¢Ñ¡Ôñ¼Ò×åÖÐTF-IDFÖµ½Ï¸ßµÄt¸ön-gram×÷ÎªÌØÕ÷¡£¡£¡£¡£¡£È»¶ø£¬£¬£¬n-gramÌØÕ÷½ö½ö·´Ó¦ÐòÁл¯ÌØÕ÷£¬£¬£¬²»¿ÉÌáÈ¡´úÂëÎı¾µÄÓïÒåÐÅÏ¢¡£¡£¡£¡£¡£
³ÂµÈÌá³öÒ»ÖÖ»ùÓÚ´úÂëÓïÒåµÄ¶ñÒâ´úÂëͬԴÅжÏÒªÁì[3]£¬£¬£¬Ê¹ÓÃWord2Vec»ñȡָÁîµÄ´ÊÏòÁ¿£¬£¬£¬²¢Ê¹ÓÃTextCNN¾ÙÐзÖÀà¡£¡£¡£¡£¡£FangµÈÈËÔò½ÓÄÉÁËFastTextÄ£×ÓÌáÈ¡JavaScript´úÂëµÄ´ÊÏòÁ¿[4]£¬£¬£¬FastText½«¶à¸öµ¥´Ê¼°Æän-gram×÷ΪÊäÈ룬£¬£¬Ö±½ÓÊä³öÄ£×ÓÅжϵÄÖֱ𡣡£¡£¡£¡£
DingµÈÌá³öÒ»ÖÖ»ã±à´úÂëµÄÓïÒåÄ£×Ó-Asm2Vec[5],ÓÃÓÚÌáȡָÁî´úÂëµÄÓïÒåÐÅÏ¢¡£¡£¡£¡£¡£¸ÃÒªÁì»ùÓÚ¾äÏòÁ¿µÄÂþÑÜʽӰÏóÄ£×ÓPV-DMÉè¼Æ£¬£¬£¬²¢Ë¼Á¿ÁË»ã±à´úÂëÃûÌõÄ˳ӦÐÔÎÊÌâ¡£¡£¡£¡£¡£ÓÉÓÚ¿ØÖÆÁ÷³ÌͼÄÜÔÚÒ»¶¨Ë®Æ½ÉÏ·´Ó¦´úÂëµÄ¶¯Ì¬Ë³ÐòÐÅÏ¢£¬£¬£¬Ò»Ð©Ñо¿ÊÂÇéÏȹ¹½¨´úÂëµÄ¿ØÖÆÁ÷³Ìͼ£¬£¬£¬ÔÙʹÓÃͼƥÅ䡢ͼÉñ¾ÍøÂ磨Graph Neural Network£¬£¬£¬GNN£©µÈÊÖÒÕÆÀ¹À´úÂëÏàËÆÐÔ¡£¡£¡£¡£¡£GNNËäÈ»ÐÔÄܱȹŰåµÄͼƥÅä¸üºÃ£¬£¬£¬µ«ÔÚÓïÒåѧϰÉÏÈÔÓÐȱ·¦¡£¡£¡£¡£¡£Îª´Ë£¬£¬£¬YuµÈÌá³öÒ»ÖÖͬʱ²¶»ñ´úÂëµÄÓïÒå¡¢½á¹¹ÒÔ¼°Ë³ÐòµÄÒªÁì[6]£¬£¬£¬Ê¹ÓÃBertÄ£×Ó¾ÙÐÐÕ¹ÍûѵÁ·ÒÔ»ñÈ¡ÓïÒåÐÅÏ¢£¬£¬£¬Ê¹ÓÃÐÂÎÅת´ïÉñ¾ÍøÂ磨Message Passing NeuralNetwork£¬£¬£¬MPNN£©»ñÈ¡½á¹¹ÐÅÏ¢£¬£¬£¬Ê¹ÓÃResnetÄ£×ÓÌáȡ˳ÐòÐÅÏ¢¡£¡£¡£¡£¡£
4.¼Æ»®Éè¼Æ
»ùÓÚ´úÂëÓïÒåµÄͬԴÆÊÎö¼Æ»®Ö÷ÒªÓÉÓïÒåÌØÕ÷ÌáÈ¡ºÍͬԴ·ÖÀàѵÁ·Á½´ó²¿·Ö×é³É¡£¡£¡£¡£¡£Ïêϸ´¦Öóͷ£Á÷³ÌÉÏ£¬£¬£¬Ö÷Òª°üÀ¨ÁËÈçϰ취£¨Í¼3£©
µÚÒ»²½£ºÊý¾Ý×¼±¸¡£¡£¡£¡£¡£ÍøÂçÑù±¾²¢±ê×¢Öֱ𣬣¬£¬¹¹½¨ÑµÁ·Êý¾Ý¼¯£»£»£»
µÚ¶þ²½£º·´»ã±à¡£¡£¡£¡£¡£¶Ô¿ÉÒÆÖ²¿ÉÖ´ÐеĶñÒâ´úÂëÎļþ¾ÙÐз´»ã±à£¬£¬£¬»ñµÃ»ã±à´úÂ룻£»£»
µÚÈý²½£ºÔ¤´¦Öóͷ£¡£¡£¡£¡£¡£Ê¹ÓÃNLPÊÖÒÕ¶Ô»ã±à¾ÙÐзִʡ¢Òªº¦´ÊɸѡµÈÔ¤´¦Öóͷ££»£»£»
µÚËIJ½£ºÓïÒåÌáÈ¡¡£¡£¡£¡£¡£¹¹½¨ÓïÒåÄ£×Ó£¬£¬£¬Ê¹ÓÃѵÁ·Êý¾Ý¾ÙÐÐѵÁ·£¬£¬£¬²¢ÌáÈ¡³öÿ¸öÑù±¾µÄÓïÒåÌØÕ÷¡£¡£¡£¡£¡£±¾ÎÄʹÓÃÁËPV-DMÒÔ¼°TextCNNÖеÄWord2Vec×÷ΪÓïÒåÌáȡģ×Ó¡£¡£¡£¡£¡£
µÚÎå²½£ºÍ¬Ô´·ÖÀà¡£¡£¡£¡£¡£Æ¾Ö¤ÓïÒåÌØÕ÷£¬£¬£¬½ÓÄÉÏàËÆÐÔ»³±§»ò¾ÛÀà/·ÖÀàËã·¨ÆÊÎöͬԴÐÔ¡£¡£¡£¡£¡£±¾ÎÄʹÓÃÁËDNN¡¢KMeans¾ÛÀà¡¢CNNµÈÊÖÒÕ¡£¡£¡£¡£¡£

ͼ3 »ùÓÚ´úÂëÓïÒåµÄͬԴÆÊÎöÁ÷³Ì
5.ʵÑéÆÊÎö
±¾½Úͨ¹ýʵÑéÑéÖ¤Á½ÖÖ»ùÓÚ´úÂëÓïÒåÄ£×ÓµÄͬԴÆÊÎöÒªÁì¡£¡£¡£¡£¡£ÊµÑéËùÓÃÑùÔȪԴÓÚÍøÂ磬£¬£¬°üÀ¨Application¡¢Backdoor¡¢Generic¡¢Trojan¡¢Variant¡¢Virus¼°WormµÈÖֱ𣨱í1£©¡£¡£¡£¡£¡£

±í1. ʵÑéÊý¾Ý¼¯
ʵÑéÒ»£º»ùÓÚPV-DMÄ£×ÓµÄͬԴÆÊÎö
ͼ4ΪPV-DMÓïÒåÄ£×ÓµÄѵÁ·Àú³Ì¡£¡£¡£¡£¡£ÌáÈ¡³ö256άµÄÓïÒåÏòÁ¿£¬£¬£¬Ó¦ÓÃÉñ¾ÍøÂç¾ÙÐзÖÀ࣬£¬£¬Æ¾Ö¤±ÈÀý4£º1»®·ÖѵÁ·¼¯ºÍ²âÊÔ¼¯£¬£¬£¬×ÜÌå׼ȷÂÊΪ0.74¡£¡£¡£¡£¡£ÁíÍ⣬£¬£¬¶ÔÌáÈ¡µÄÓïÒåÌØÕ÷½ÓÄÉKMeansËã·¨¾ÙÐÐÁ˾ÛÀ࣬£¬£¬²âÊÔ׼ȷÂÊͬÑùÊÇ0.74¡£¡£¡£¡£¡£

ͼ4 »ùÓÚ PV-DMµÄDNNÄ£×ÓѵÁ·¼°²âÊÔ

ͼ5 »ùÓÚPV-DMµÄKMeans¾ÛÀࣨAccuracy=0.74£©
ʵÑé¶þ£º»ùÓÚTextCNNµÄͬԴÆÊÎö
ͼ6ΪÑù±¾ÖÐÖ¸ÁîÊýÄ¿µÄͳ¼Æ£¬£¬£¬Æ½¾ùÖ¸ÁîÊýĿΪ28£¬£¬£¬×îСΪ1£¨195¸öÑù±¾£©£¬£¬£¬×î´óΪ74£¨1¸öÑù±¾£©¡£¡£¡£¡£¡£¹¹½¨TextCNNÄ£×Ó£¬£¬£¬ÉèÖòî±ð¾ÞϸµÄһά¾í»ýºË£¬£¬£¬½«ÌØÕ÷ͼ×î´ó³Ø»¯²¢Æ´½Ó£¬£¬£¬½«Êý¾Ý¼¯Æ¾Ö¤±ÈÀý4£º1»®·ÖΪѵÁ·¼¯ºÍÑéÖ¤¼¯£¬£¬£¬Èçͼ7Ëùʾ£¬£¬£¬²âÊÔ׼ȷÂÊΪ0.65×óÓÒ¡£¡£¡£¡£¡£

ͼ6 Ö¸ÁîÊýĿͳ¼Æ

ͼ7 TextCNNѵÁ·¼°²âÊÔ
6.×ܽá
±¾ÎÄͨ¹ýʵÑé֤ʵÎú»ùÓÚ´úÂëÓïÒåµÄ¶ñÒâ´úÂëͬԴÆÊÎöÒªÁì¾ß±¸Ò»¶¨µÄ¿ÉÐÐÐÔ¡£¡£¡£¡£¡£È»¶ø£¬£¬£¬PV-DM¡¢TextCNNÒªÁìÖ±½ÓÓ¦ÓÃÓÚÌáÈ¡»ã±à´úÂëÓïÒåʱ£¬£¬£¬ÍêÈ«½«»ã±à´úÂëÀà±È³É´¿Îı¾£¬£¬£¬ÓïÒåÌáÈ¡µÄ׼ȷÐÔÂԵ͡£¡£¡£¡£¡£ÎÄÏ×[5]ÊÇÕë¶Ô»ã±à´úÂë¶øÉè¼ÆµÄÓïÒåÌáȡҪÁ죬£¬£¬Äܹ»Ô½·¢×¼È·µØÌáÈ¡ÓïÒåÐÅÏ¢£¬£¬£¬ºóÐø½«Î§ÈÆ´ËÒªÁì×÷½øÒ»²½Ñо¿¡£¡£¡£¡£¡£
²Î¿¼ÎÄÏ×
[1]ÖÇÄÜÇå¾²Ñо¿×é È˹¤ÖÇÄÜÇå¾²|AIÇå¾²Ó¦ÓÃ|»ùÓÚͼÏñ·ÖÀàµÄͬԴÆÊÎö. 2021.10.15
[2]Hanqi Zhang, Xi Xiao.Classification of ransome families with machine learning based on N-gram ofopcodes[J]. Future generation computer system, 2019(90):211-221.
[3]³Âº²´£¬£¬£¬ÎâÔ½£¬£¬£¬×Þ¸£Ì© . »ùÓÚ Asm2Vec µÄ¶ñÒâ´úÂëͬԴÅжÏÒªÁì [J]. ͨѶÊÖÒÕ ,2019,52(12):3010-3015.
[4]Yong Fang, Cheng Huang.Detecting malicious JavaScript code based on semantic analysis[J].Computer&Security, 2020(93):1-9.
[5]Steven H H Ding, Benjamin C MFung. Asm2Vec: Boosting Static Representation Robustness for Binary CloneSearch against Code Obfuscation and Compiler Optimization[C]. S&P,2019:1-18.
[6]Zeping Yu, Rui Cao, Qiyi Tang,et al. Order Matters£ºSemantic-Aware Neural Networks forBinary Code Similarity Detection[C]. AAAI, 2020:1-8.
°æÈ¨ÉùÃ÷
×ªÔØÇëÎñ±Ø×¢Ã÷À´ÓÉ¡£¡£¡£¡£¡£
°æÈ¨ËùÓУ¬£¬£¬Î¥Õ߱ؾ¿¡£¡£¡£¡£¡£
- Òªº¦´Ê±êÇ©£º
- ׯÏÐÓÎÏ· È˹¤ÖÇÄÜÇå¾² AIÇå¾²Ó¦ÓÃ

¾©¹«Íø°²±¸ 11010802026257ºÅ